TextMarker: A Tool for Rule-Based Information Extraction
نویسندگان
چکیده
This paper presents TEXTMARKER– a powerful toolkit for rule-based information extraction. TEXTMARKER is based on UIMA and provides versatile information processing and advanced extraction techniques. We thoroughly describe the system and its capabilities for human-like information processing and rapid prototyping of information extraction applications.
منابع مشابه
Test-Driven Development of Complex Information Extraction Systems using TextMarker
Information extraction is concerned with the location of specific items in textual documents. Common process models for this task use ad-hoc testing methods against a gold standard. This paper presents an approach for the testdriven development of complex information extraction systems. We propose a process model for test-driven information extraction, and discuss its implementation using the r...
متن کاملRule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structur...
متن کاملA Framework for Semi-Automatic Development of Rule-based Information Extraction Applications
For the successful processing and handling of (large scale) document collections, effective information extraction methods are essential. This paper presents a framework for the semiautomatic development of rule-based information extraction applications based on the TEXTMARKER language utilizing machine learning methods. We describe the approach in detail and present the TEXTRULER system as an ...
متن کاملIntegrating the Rule-Based IE Component TextMarker into UIMA
In this paper we describe the integration of the rule-based IE component TEXTMARKER into the UIMA framework. We present a conceptual overview on the TEXTMARKER system before we describe the UIMA integration in detail.
متن کاملReliability Measures Measurement under Rule-Based Fuzzy Logic Technique
In reliability theory, the reliability measures contend the very important and depreciative role for any system analysis. Measurement of reliability measures is not easy due to ambiguity and vagueness which exist within reliability parameters. It is also very difficult to incorporate a large amount of uncertainty in well-established methodologies and techniques. However, fuzzy logic provides an...
متن کامل